An accurate home price prediction algorithm can reduce volatility in the housing market and take into account existing factors that may not be reflected in a home’s previous selling prices (e.g., new roof, new shopping center, etc.) However, predictive algorithms can also be exceedingly difficult to perfect. A falsely high average estimate in a neighborhood might lead home sellers to list their homes at too high an asking price and dragging out the process of selling their home, thereby introducing friction into the housing market. A falsely low estimate may depress the value of what is oftentimes a homeowner’s most valuable asset.
This project attempts to predict housing prices in metropolitan Miami by taking into consideration a home’s unique features (e.g., fence, patio) as well as considering local amenities and external features like schools, parks, and access to major roads.
To create our model, we converted our features of interest into variables that can be fed into an OLS regression model. We tested each featured for correlation with home sale prices and fine-tuned our model until we were able to minimize error.
One interesting finding from this process is which middle school a house exists
| Statistic | N | Mean | St. Dev. | Min | Max |
| SalePrice | 2,066 | 405,476.400 | 199,741.700 | 12,500 | 1,000,000 |
| LotSize | 2,066 | 6,360.875 | 1,721.617 | 1,250 | 17,620 |
| Age | 2,066 | 70.954 | 18.186 | -1 | 115 |
| Stories | 2,066 | 1.073 | 0.265 | 0 | 3 |
| Bed | 2,066 | 2.692 | 0.794 | 0 | 8 |
| Bath | 2,066 | 1.611 | 0.700 | 0 | 6 |
| Pool | 2,066 | 0.108 | 0.310 | 0 | 1 |
| Fence | 2,066 | 0.738 | 0.440 | 0 | 1 |
| Patio | 2,066 | 0.499 | 0.500 | 0 | 1 |
| Shore1 | 2,066 | 7,047.549 | 5,248.614 | 88.597 | 26,528.540 |
| MedRent | 2,040 | 1,042.535 | 311.133 | 246.000 | 2,297.000 |
| pctWhite | 2,062 | 0.703 | 0.320 | 0.057 | 0.989 |
| pctPoverty | 2,062 | 0.217 | 0.108 | 0.052 | 0.556 |
| Brownsville.MS | 1,588 | 0.098 | 0.298 | 0.000 | 1.000 |
| CitrusGrove.MS | 1,588 | 0.115 | 0.319 | 0.000 | 1.000 |
| JosedeDiego.MS | 1,588 | 0.129 | 0.335 | 0.000 | 1.000 |
| GeorgiaJA.MS | 1,588 | 0.133 | 0.340 | 0.000 | 1.000 |
| KinlochPk.MS | 1,588 | 0.196 | 0.397 | 0.000 | 1.000 |
| Madison.MS | 1,588 | 0.001 | 0.035 | 0.000 | 1.000 |
| Nautilus.MS | 1,588 | 0.061 | 0.240 | 0.000 | 1.000 |
| Shenandoah.MS | 1,588 | 0.243 | 0.429 | 0.000 | 1.000 |
| WestMiami.MS | 1,588 | 0.024 | 0.153 | 0.000 | 1.000 |
| Dependent variable: | ||
| SalePrice | ||
| (1) | (2) | |
| Folio | 0.00000 | |
| (0.00000) | ||
| Property.CityMiami Beach | 220,589.400** | |
| (102,729.500) | ||
| LotSize | 17.974*** | |
| (1.660) | ||
| Bed | 8,653.610* | |
| (4,483.327) | ||
| Bath | 4,613.343 | |
| (5,440.150) | ||
| Stories | 13,854.920 | |
| (11,214.380) | ||
| Pool | 77,281.650*** | |
| (9,820.892) | ||
| Fence | -149.050 | |
| (5,646.349) | ||
| Patio | 4,073.120 | |
| (5,115.939) | ||
| ActualSqFt | 67.033*** | |
| (6.231) | ||
| Age | -698.975*** | |
| (147.148) | ||
| Shore1 | -5.745*** | -3.369*** |
| (1.229) | (1.030) | |
| MedHHInc | 1.370*** | 1.091*** |
| (0.234) | (0.193) | |
| TotalPop | 4.453** | 3.400** |
| (1.797) | (1.481) | |
| MedRent | 8.011 | 10.111 |
| (16.927) | (13.970) | |
| pctWhite | 87,738.750*** | 72,584.590*** |
| (21,213.910) | (18,038.400) | |
| pctPoverty | -65,160.580 | -28,329.320 |
| (46,955.060) | (38,669.420) | |
| Brownsville.MS | -74,321.900** | -19,595.140 |
| (35,665.230) | (29,848.970) | |
| CitrusGrove.MS | -63,085.380* | -16,536.970 |
| (33,757.230) | (27,908.010) | |
| JosedeDiego.MS | -24,574.030 | 41,689.690 |
| (36,087.170) | (30,171.430) | |
| GeorgiaJA.MS | -83,659.540** | -21,745.790 |
| (33,968.440) | (28,569.970) | |
| KinlochPk.MS | -20,898.120 | 7,151.547 |
| (23,871.550) | (19,733.870) | |
| Madison.MS | -102,752.000 | -24,901.940 |
| (89,066.670) | (73,254.900) | |
| Nautilus.MS | 229,234.000*** | |
| (37,774.810) | ||
| Shenandoah.MS | 99,482.560*** | 122,292.100*** |
| (31,097.140) | (25,991.120) | |
| WestMiami.MS | ||
| Constant | 274,559.400*** | -5,246.435 |
| (50,967.110) | (142,697.300) | |
| Observations | 1,584 | 1,584 |
| R2 | 0.603 | 0.736 |
| Adjusted R2 | 0.599 | 0.732 |
| Residual Std. Error | 113,746.600 (df = 1569) | 93,070.580 (df = 1559) |
| F Statistic | 170.158*** (df = 14; 1569) | 180.949*** (df = 24; 1559) |
| Note: | p<0.1; p<0.05; p<0.01 | |
#our best model was reg.cv2
The first regression we combined our feature engineering variables to see which were statistically significant. The second regression includes all of the off-the-shelf features with our custom features. Model improves a lot by R2.
| intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| TRUE | 130147.8 | 0.7933946 | 93060.55 | 354594.9 | 0.2778949 | 210570.7 |
| intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| TRUE | 98912.42 | 0.709759 | 82492.1 | 52113.41 | 0.271611 | 39852.27 |